Contents

This is part of a bigger study you can find here

1 Alcohol liver disease

1.1 Load disease expression

The expression data is provided as counts in a tab separate value format. In another file there is the phenoData associated to the samples.

count <- read.delim("../data/ALD_ramon/refseq_counts_ALD.tsv", row.names = 1)
phenoData <- read.csv("../data/ALD_ramon/NGS_AHsteps.design.csv", row.names = 1)

First we will explore a little bit the data, in case we need to correct something:

library("geneplotter")
summary(phenoData)
##      Group         Sample.id        L                               G     
##  E      :12   INTEAM_A01: 1   E      :12   Early.ASH                 :12  
##  C      :11   INTEAM_A02: 1   C      :11   AH.Non.severe             :11  
##  D      :10   INTEAM_A03: 1   D      :10   Explants.from.AH          :10  
##  I      :10   INTEAM_A04: 1   I      :10   Normal.livers             :10  
##  A      : 9   INTEAM_A05: 1   A      : 9   AH.Severe.-.non-responders: 9  
##  B      : 9   INTEAM_A06: 1   B      : 9   AH.Severe.-.responders    : 9  
##  (Other):27   (Other)   :82   (Other):27   (Other)                   :27
phenoData
##     Group  Sample.id L                          G
## A01     A INTEAM_A01 A     AH.Severe.-.responders
## A02     A INTEAM_A02 A     AH.Severe.-.responders
## A03     A INTEAM_A03 A     AH.Severe.-.responders
## A04     A INTEAM_A04 A     AH.Severe.-.responders
## A05     A INTEAM_A05 A     AH.Severe.-.responders
## A06     A INTEAM_A06 A     AH.Severe.-.responders
## A07     A INTEAM_A07 A     AH.Severe.-.responders
## A08     A INTEAM_A08 A     AH.Severe.-.responders
## A09     A INTEAM_A09 A     AH.Severe.-.responders
## B10     B INTEAM_B10 B AH.Severe.-.non-responders
## B11     B INTEAM_B11 B AH.Severe.-.non-responders
## B12     B INTEAM_B12 B AH.Severe.-.non-responders
## B13     B INTEAM_B13 B AH.Severe.-.non-responders
## B14     B INTEAM_B14 B AH.Severe.-.non-responders
## B15     B INTEAM_B15 B AH.Severe.-.non-responders
## B16     B INTEAM_B16 B AH.Severe.-.non-responders
## B17     B INTEAM_B17 B AH.Severe.-.non-responders
## B18     B INTEAM_B18 B AH.Severe.-.non-responders
## C19     C INTEAM_C19 C              AH.Non.severe
## C20     C INTEAM_C20 C              AH.Non.severe
## C21     C INTEAM_C21 C              AH.Non.severe
## C22     C INTEAM_C22 C              AH.Non.severe
## C23     C INTEAM_C23 C              AH.Non.severe
## C24     C INTEAM_C24 C              AH.Non.severe
## C25     C INTEAM_C25 C              AH.Non.severe
## C26     C INTEAM_C26 C              AH.Non.severe
## C27     C INTEAM_C27 C              AH.Non.severe
## C28     C INTEAM_C28 C              AH.Non.severe
## C29     C INTEAM_C29 C              AH.Non.severe
## D30     D INTEAM_D30 D           Explants.from.AH
## D31     D INTEAM_D31 D           Explants.from.AH
## D32     D INTEAM_D32 D           Explants.from.AH
## D34     D INTEAM_D34 D           Explants.from.AH
## D35     D INTEAM_D35 D           Explants.from.AH
## D36     D INTEAM_D36 D           Explants.from.AH
## D37     D INTEAM_D37 D           Explants.from.AH
## D38     D INTEAM_D38 D           Explants.from.AH
## D39     D INTEAM_D39 D           Explants.from.AH
## D40     D INTEAM_D40 D           Explants.from.AH
## E41     E INTEAM_E41 E                  Early.ASH
## E42     E INTEAM_E42 E                  Early.ASH
## E43     E INTEAM_E43 E                  Early.ASH
## E44     E INTEAM_E44 E                  Early.ASH
## E45     E INTEAM_E45 E                  Early.ASH
## E46     E INTEAM_E46 E                  Early.ASH
## E47     E INTEAM_E47 E                  Early.ASH
## E48     E INTEAM_E48 E                  Early.ASH
## E49     E INTEAM_E49 E                  Early.ASH
## E50     E INTEAM_E50 E                  Early.ASH
## E51     E INTEAM_E51 E                  Early.ASH
## E52     E INTEAM_E52 E                  Early.ASH
## F53     F INTEAM_F53 F                Hepatitis.C
## F55     F INTEAM_F55 F                Hepatitis.C
## F56     F INTEAM_F56 F                Hepatitis.C
## F57     F INTEAM_F57 F                Hepatitis.C
## F58     F INTEAM_F58 F                Hepatitis.C
## F59     F INTEAM_F59 F                Hepatitis.C
## F60     F INTEAM_F60 F                Hepatitis.C
## F61     F INTEAM_F61 F                Hepatitis.C
## F62     F INTEAM_F62 F                Hepatitis.C
## G63     G INTEAM_G63 G                       NASH
## G64     G INTEAM_G64 G                       NASH
## G65     G INTEAM_G65 G                       NASH
## G66     G INTEAM_G66 G                       NASH
## G67     G INTEAM_G67 G                       NASH
## G68     G INTEAM_G68 G                       NASH
## G69     G INTEAM_G69 G                       NASH
## G70     G INTEAM_G70 G                       NASH
## G71     G INTEAM_G71 G                       NASH
## H72     H INTEAM_H72 H      Compensated.cirrhosis
## H73     H INTEAM_H73 H      Compensated.cirrhosis
## H74     H INTEAM_H74 H      Compensated.cirrhosis
## H75     H INTEAM_H75 H      Compensated.cirrhosis
## H76     H INTEAM_H76 H      Compensated.cirrhosis
## H77     H INTEAM_H77 H      Compensated.cirrhosis
## H78     H INTEAM_H78 H      Compensated.cirrhosis
## H79     H INTEAM_H79 H      Compensated.cirrhosis
## H80     H INTEAM_H80 H      Compensated.cirrhosis
## I89     I INTEAM_I89 I              Normal.livers
## I90     I INTEAM_I90 I              Normal.livers
## I91     I INTEAM_I91 I              Normal.livers
## I92     I INTEAM_I92 I              Normal.livers
## I93     I INTEAM_I93 I              Normal.livers
## I94     I INTEAM_I94 I              Normal.livers
## I95     I INTEAM_I95 I              Normal.livers
## I96     I INTEAM_I96 I              Normal.livers
## I97     I INTEAM_I97 I              Normal.livers
## I98     I INTEAM_I98 I              Normal.livers
count[1:5, 1:5]
##           A01 A02 A03 A04 A05
## DDX11L1     3   8   0   2   2
## WASH7P     43  82 104  66  82
## MIR6859-3   0   0   0   0   0
## MIR6859-4   0   0   0   0   0
## MIR6859-1   0   0   0   0   0

So we will delete two colums that has the same information as the column G. And we will use more easily readable names:

phenoData <- phenoData[, c("Sample.id", "G")]
colnames(phenoData) <- c("Sample", "Status")
levels(phenoData$Status) <- c("AH", "Non-responders", "Responders", "C.Comp", 
                              "ASH", "Explants", "HCV", "NASH", "Normal")
phenoData
##         Sample         Status
## A01 INTEAM_A01     Responders
## A02 INTEAM_A02     Responders
## A03 INTEAM_A03     Responders
## A04 INTEAM_A04     Responders
## A05 INTEAM_A05     Responders
## A06 INTEAM_A06     Responders
## A07 INTEAM_A07     Responders
## A08 INTEAM_A08     Responders
## A09 INTEAM_A09     Responders
## B10 INTEAM_B10 Non-responders
## B11 INTEAM_B11 Non-responders
## B12 INTEAM_B12 Non-responders
## B13 INTEAM_B13 Non-responders
## B14 INTEAM_B14 Non-responders
## B15 INTEAM_B15 Non-responders
## B16 INTEAM_B16 Non-responders
## B17 INTEAM_B17 Non-responders
## B18 INTEAM_B18 Non-responders
## C19 INTEAM_C19             AH
## C20 INTEAM_C20             AH
## C21 INTEAM_C21             AH
## C22 INTEAM_C22             AH
## C23 INTEAM_C23             AH
## C24 INTEAM_C24             AH
## C25 INTEAM_C25             AH
## C26 INTEAM_C26             AH
## C27 INTEAM_C27             AH
## C28 INTEAM_C28             AH
## C29 INTEAM_C29             AH
## D30 INTEAM_D30       Explants
## D31 INTEAM_D31       Explants
## D32 INTEAM_D32       Explants
## D34 INTEAM_D34       Explants
## D35 INTEAM_D35       Explants
## D36 INTEAM_D36       Explants
## D37 INTEAM_D37       Explants
## D38 INTEAM_D38       Explants
## D39 INTEAM_D39       Explants
## D40 INTEAM_D40       Explants
## E41 INTEAM_E41            ASH
## E42 INTEAM_E42            ASH
## E43 INTEAM_E43            ASH
## E44 INTEAM_E44            ASH
## E45 INTEAM_E45            ASH
## E46 INTEAM_E46            ASH
## E47 INTEAM_E47            ASH
## E48 INTEAM_E48            ASH
## E49 INTEAM_E49            ASH
## E50 INTEAM_E50            ASH
## E51 INTEAM_E51            ASH
## E52 INTEAM_E52            ASH
## F53 INTEAM_F53            HCV
## F55 INTEAM_F55            HCV
## F56 INTEAM_F56            HCV
## F57 INTEAM_F57            HCV
## F58 INTEAM_F58            HCV
## F59 INTEAM_F59            HCV
## F60 INTEAM_F60            HCV
## F61 INTEAM_F61            HCV
## F62 INTEAM_F62            HCV
## G63 INTEAM_G63           NASH
## G64 INTEAM_G64           NASH
## G65 INTEAM_G65           NASH
## G66 INTEAM_G66           NASH
## G67 INTEAM_G67           NASH
## G68 INTEAM_G68           NASH
## G69 INTEAM_G69           NASH
## G70 INTEAM_G70           NASH
## G71 INTEAM_G71           NASH
## H72 INTEAM_H72         C.Comp
## H73 INTEAM_H73         C.Comp
## H74 INTEAM_H74         C.Comp
## H75 INTEAM_H75         C.Comp
## H76 INTEAM_H76         C.Comp
## H77 INTEAM_H77         C.Comp
## H78 INTEAM_H78         C.Comp
## H79 INTEAM_H79         C.Comp
## H80 INTEAM_H80         C.Comp
## I89 INTEAM_I89         Normal
## I90 INTEAM_I90         Normal
## I91 INTEAM_I91         Normal
## I92 INTEAM_I92         Normal
## I93 INTEAM_I93         Normal
## I94 INTEAM_I94         Normal
## I95 INTEAM_I95         Normal
## I96 INTEAM_I96         Normal
## I97 INTEAM_I97         Normal
## I98 INTEAM_I98         Normal

However we are only interested on those related to alcohol disease, so we need to subset them:

alcohol <- !phenoData$Status %in% c("HCV", "NASH", "Explants")
phenoData <- phenoData[alcohol, ]
count <- count[, colnames(count) %in% rownames(phenoData)]

We can calculate the normalization factors used to calculate the Counts per million read:

library("edgeR")
dge <- DGEList(counts = count, group = phenoData$Status)
ord <- order(dge$sample$lib.size)
barplot(dge$sample$lib.size[ord]/1e6, las = 1, ylab = "Millions of reads",
        xlab = "Samples", main = "Library size of the samples", col = phenoData$Status)
legend("topleft", legend = unique(phenoData$Status), fill = unique(phenoData$Status))

dge <- calcNormFactors(dge)

We can normalize the data to counts per millon of reads taking into account that we know the normalized library size:

expression <- cpm(dge, normalized.lib.sizes = TRUE, log = TRUE, prior.count = 3)

1.2 Check quality pre-normalizations

We can observe the quality of the samples with a boxplot and the density of expressions:

boxplot(expression, main = "Expression per sample", col = phenoData$Status)

multidensity(expression, main = "Densities of expression", legend = NULL)

avgexp <- rowMeans(expression)
hist(avgexp, main = "Histogram of mean expression per gene")
abline(v = 1, col = "red", lwd = 2)

There is a lot of differences between samples on the low expressed genes. We filter those genes that are below 1, as they are unreliable:

expression <- expression[avgexp > 1, ]
multidensity(expression, main = "Densities of expression", legend = NULL)

Now the distribution of the expression between samples is much more comparable. Following the recomendation of the edgeR package we recalculate the normalization factor without the lowly expressed genes:

dge <- calcNormFactors(dge[avgexp > 1, ])
expression <- cpm(dge, normalized.lib.sizes = TRUE, log = TRUE, prior.count = 3)

We can check how has this normalization affected the values:

boxplot(expression, main = "Expression per sample", col = phenoData$Status)

multidensity(expression, main = "Densities of expression", legend = NULL)

avgexp <- rowMeans(expression)
hist(avgexp, main = "Histogram of mean expression per gene")

Now are a little bit more comparable.
To work easier with the data I proceed to create an ExpressionSet

library("Biobase")
library("AnnotationDbi")
ALD <- ExpressionSet(as.matrix(expression))
phenoData(ALD) <- AnnotatedDataFrame(phenoData)
ALD
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 13366 features, 60 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   rowNames: A01 A02 ... I98 (60 total)
##   varLabels: Sample Status
##   varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation:

1.3 Check the normalizations applied

We take advantatge of the plotMDS function of limma package to plot the MDS of the samples, to see how close are between them.

plotMDS(ALD, top = Inf, labels = phenoData(ALD)$Status, main = "Samples relationships")

We can see that they are clearly separated in two groups, but some samples like the Alcohol Hepatitis samples are on both sites of the first dimension. We can also observe that the cirrhosis compensated samples are closer to the normal or healthy samples.

1.4 Design

The design we will use to find the differentially expressed genes is the following:

# To ensure sintactically valid names
names <- make.names(levels(droplevels(phenoData(ALD)$Status))) 
design <- sapply(names, function(x){
  x <- ifelse(x == "Non.responders", "Non-responders", x)
  ifelse(phenoData(ALD)$Status == x, 1, 0)
})
rownames(design) <- rownames(phenoData(ALD))
design
##     AH Non.responders Responders C.Comp ASH Normal
## A01  0              0          1      0   0      0
## A02  0              0          1      0   0      0
## A03  0              0          1      0   0      0
## A04  0              0          1      0   0      0
## A05  0              0          1      0   0      0
## A06  0              0          1      0   0      0
## A07  0              0          1      0   0      0
## A08  0              0          1      0   0      0
## A09  0              0          1      0   0      0
## B10  0              1          0      0   0      0
## B11  0              1          0      0   0      0
## B12  0              1          0      0   0      0
## B13  0              1          0      0   0      0
## B14  0              1          0      0   0      0
## B15  0              1          0      0   0      0
## B16  0              1          0      0   0      0
## B17  0              1          0      0   0      0
## B18  0              1          0      0   0      0
## C19  1              0          0      0   0      0
## C20  1              0          0      0   0      0
## C21  1              0          0      0   0      0
## C22  1              0          0      0   0      0
## C23  1              0          0      0   0      0
## C24  1              0          0      0   0      0
## C25  1              0          0      0   0      0
## C26  1              0          0      0   0      0
## C27  1              0          0      0   0      0
## C28  1              0          0      0   0      0
## C29  1              0          0      0   0      0
## E41  0              0          0      0   1      0
## E42  0              0          0      0   1      0
## E43  0              0          0      0   1      0
## E44  0              0          0      0   1      0
## E45  0              0          0      0   1      0
## E46  0              0          0      0   1      0
## E47  0              0          0      0   1      0
## E48  0              0          0      0   1      0
## E49  0              0          0      0   1      0
## E50  0              0          0      0   1      0
## E51  0              0          0      0   1      0
## E52  0              0          0      0   1      0
## H72  0              0          0      1   0      0
## H73  0              0          0      1   0      0
## H74  0              0          0      1   0      0
## H75  0              0          0      1   0      0
## H76  0              0          0      1   0      0
## H77  0              0          0      1   0      0
## H78  0              0          0      1   0      0
## H79  0              0          0      1   0      0
## H80  0              0          0      1   0      0
## I89  0              0          0      0   0      1
## I90  0              0          0      0   0      1
## I91  0              0          0      0   0      1
## I92  0              0          0      0   0      1
## I93  0              0          0      0   0      1
## I94  0              0          0      0   0      1
## I95  0              0          0      0   0      1
## I96  0              0          0      0   0      1
## I97  0              0          0      0   0      1
## I98  0              0          0      0   0      1

As you can see each group has its own colum to increase the power to the model, and to allow for more informative contrasts.

refact <- function(x) {
  if (x == "Normal") {
    0
  } else if ( x == "ASH") {
    1
  } else if (x == "C.Comp") {
    2
  } else if (x == "AH") {
    4
  } else if (x == "Responders") {
    5
  } else if (x == "Non-responders") {
    5
  } else {
    stop("Unexpected level")
  }
}
design2 <- sapply(as.character(phenoData(ALD)$Status), refact)
design2 <- as.matrix(design2)
design2 <- cbind(rep(1, ncol(design2)), design2)
colnames(design2) <- c("(Intercept)", "Progression")
rownames(design2) <- rownames(phenoData(ALD))
design2
##     (Intercept) Progression
## A01           1           5
## A02           1           5
## A03           1           5
## A04           1           5
## A05           1           5
## A06           1           5
## A07           1           5
## A08           1           5
## A09           1           5
## B10           1           5
## B11           1           5
## B12           1           5
## B13           1           5
## B14           1           5
## B15           1           5
## B16           1           5
## B17           1           5
## B18           1           5
## C19           1           4
## C20           1           4
## C21           1           4
## C22           1           4
## C23           1           4
## C24           1           4
## C25           1           4
## C26           1           4
## C27           1           4
## C28           1           4
## C29           1           4
## E41           1           1
## E42           1           1
## E43           1           1
## E44           1           1
## E45           1           1
## E46           1           1
## E47           1           1
## E48           1           1
## E49           1           1
## E50           1           1
## E51           1           1
## E52           1           1
## H72           1           2
## H73           1           2
## H74           1           2
## H75           1           2
## H76           1           2
## H77           1           2
## H78           1           2
## H79           1           2
## H80           1           2
## I89           1           0
## I90           1           0
## I91           1           0
## I92           1           0
## I93           1           0
## I94           1           0
## I95           1           0
## I96           1           0
## I97           1           0
## I98           1           0

Normalize the expression by calculating an appropriate observation weight.

v1 <- voom(dge, design, plot = TRUE, normalize.method = "cyclicloess")

v2 <- voom(dge, design2, plot = TRUE, normalize.method = "cyclicloess")

Note that even if we filter those with low expression we assign the correct library size, to avoid over estimation.
We store the normalized expression and the pehnotypic data for uses on the network construction step.

save(v1, v2, phenoData, file = "Expression_ALD.RData")

1.5 Estimate surrogate variables

We can further estimate surrogate variables with the package sva:

library("sva")
sv1 <- sva(v1$E, design)
## Number of significant surrogate variables is:  9 
## Iteration (out of 5 ):1  2  3  4  5
sv2 <- sva(v2$E, design2)
## Number of significant surrogate variables is:  9 
## Iteration (out of 5 ):1  2  3  4  5

We can append the estimated surrogated variables to the design matrix for a better estimation of the effect of each paramter:

i.design <- cbind(design, sv1$sv)
colnames(i.design) <- c(colnames(design), paste0("X", 1:ncol(sv1$sv)))
i.design2 <- cbind(design2, sv2$sv)
colnames(i.design2) <- c(colnames(design2), paste0("X", 1:ncol(sv2$sv)))

We fit each design with the expression of the alcoholic liver disease:

fit <- lmFit(v1$E, i.design)
## Coefficients not estimable: X9
## Warning: Partial NA coefficients for 13366 probe(s)
fit.2 <- lmFit(v2$E, i.design2)

1.6 Contrasts

We are interested in several comparisons, in the first design:

contrasts0 <- makeContrasts(
  "ASHvsNormal" = ASH - Normal,
  "CirrhosisVsNormal" = C.Comp - Normal,
  "AHvsNormal" = AH - Normal,
  "RespondersVsNormal" = Responders - Normal,
  "Non.respondersVsNormal" = Non.responders - Normal,
  
  # Assuming that some status are similar
  "SevereVsNormal" = (Responders + Non.responders)/2 - Normal,
  
  "RespondersVsNon.responders" = Responders - Non.responders,
  
  "CirrhosisVsASH" = C.Comp - ASH,
  "AHvsCirrhosis" = AH - C.Comp,
  
  "Non.respondersVsAH" = Non.responders - AH,
  "RespondersVsAH" = Responders - AH,
  "SevereVsHepatitis" = (Non.responders + Responders)/2 - AH,
  
  levels = i.design)
head(contrasts0)
##                 Contrasts
## Levels           ASHvsNormal CirrhosisVsNormal AHvsNormal
##   AH                       0                 0          1
##   Non.responders           0                 0          0
##   Responders               0                 0          0
##   C.Comp                   0                 1          0
##   ASH                      1                 0          0
##   Normal                  -1                -1         -1
##                 Contrasts
## Levels           RespondersVsNormal Non.respondersVsNormal SevereVsNormal
##   AH                              0                      0            0.0
##   Non.responders                  0                      1            0.5
##   Responders                      1                      0            0.5
##   C.Comp                          0                      0            0.0
##   ASH                             0                      0            0.0
##   Normal                         -1                     -1           -1.0
##                 Contrasts
## Levels           RespondersVsNon.responders CirrhosisVsASH AHvsCirrhosis
##   AH                                      0              0             1
##   Non.responders                         -1              0             0
##   Responders                              1              0             0
##   C.Comp                                  0              1            -1
##   ASH                                     0             -1             0
##   Normal                                  0              0             0
##                 Contrasts
## Levels           Non.respondersVsAH RespondersVsAH SevereVsHepatitis
##   AH                             -1             -1              -1.0
##   Non.responders                  1              0               0.5
##   Responders                      0              1               0.5
##   C.Comp                          0              0               0.0
##   ASH                             0              0               0.0
##   Normal                          0              0               0.0

Given those comparisons of interest we now evaluate the results:

fit2 <- contrasts.fit(fit, contrasts0)
fit2 <- eBayes(fit2, proportion = 0.1)
results <- decideTests(fit2, adjust.method = "fdr", lfc = log2(2))
summary(results)
##    ASHvsNormal CirrhosisVsNormal AHvsNormal RespondersVsNormal
## -1         422               500        747                830
## 0        12576             12351      11774              11916
## 1          368               515        845                620
##    Non.respondersVsNormal SevereVsNormal RespondersVsNon.responders
## -1                   1350           1314                          0
## 0                   10618          10721                      13366
## 1                    1398           1331                          0
##    CirrhosisVsASH AHvsCirrhosis Non.respondersVsAH RespondersVsAH
## -1            182           511                190              0
## 0           12906         12269              13131          13366
## 1             278           586                 45              0
##    SevereVsHepatitis
## -1                63
## 0              13298
## 1                  5
fit2.2 <- eBayes(fit.2, proportion = 0.75) 
results2 <- decideTests(fit2.2, adjust.method = "fdr", lfc = log2(2))
summary(results2)
##    (Intercept) Progression    X1    X2    X3    X4    X5    X6    X7    X8
## -1          10          10  3452  2318   400  1820  1739   579   443   169
## 0          489       13349  6273  8575 12732  9971 10024 12358 12451 13016
## 1        12867           7  3641  2473   234  1575  1603   429   472   181
##       X9
## -1   553
## 0  12310
## 1    503

We can observe the quality of the t-values over the teoretical quantiles to observe if there is any assumption about the eBayes fitting which doesn’t holds. Note that I already modified the expected proportion of genes in the call to eBayes:

par(mfrow = c(ncol(results), 3))
out <- sapply(colnames(results), function(x){
  qqt(fit2$t[, x], df = fit2$df.total, main = paste("Student's t Q-Q Plot of", x))
  abline(0, 1)
  volcanoplot(fit2, coef = x, main = x)
  plotMD(fit2, coef = x, main = x)
})

We can visualize the same for the other model we have:

par(mfrow = c(1, 3))
qqt(fit2.2$t[, "Progression"], df = fit2.2$df.total, 
    main = "Student's t Q-Q Plot of Progression")
abline(0, 1)
volcanoplot(fit2.2, coef = "Progression", main = "Progression")
plotMD(fit2.2, coef = "Progression", main = "Progression")

Here we can see on the last plot that the expression of the genes show a bimodal distribution in the progression of the dissease. Which seems to confirm that at one point it descompensates and has a very bad prognossis.

par(mfrow = c(1, 2))
plotSA(fit2, main = "First model: Mean−variance trend")
plotSA(fit2.2, main = "Progression model: Mean−variance trend")

1.7 Store data

It is a good idea to store the data, not only the program of your analysis, so here I go:

save(fit, fit2, fit2.2, fit.2, design, design2, i.design, i.design2, 
     dge, ALD, contrasts0, v1, v2, file = "ALD.RData")

1.8 Differentially expressed genes

We can plot the top 2000 genes with the highest absolute value of log fold-change in each contrast just for informative purposes:

par(mfrow = c(ncol(contrasts0), 2))
out <- sapply(colnames(contrasts0), function(x) {
  tt.ALD <- topTable(fit2, coef = x, sort.by = "logFC", number = Inf)
  tt.ALD <- tt.ALD[order(-abs(tt.ALD$logFC)), ]
  plot(density(tt.ALD[1:2000, "logFC"]), main = paste("Density of", x))
  hist(tt.ALD[1:2000, "logFC"], main = paste("Histogram of", x), xlab = "logFC")
})

For the progression model we can plot it too those top 2000 significative at the threshold of 0.05:

par(mfrow = c(1, 2))
tt.ALD2 <- topTable(fit2.2, coef = "Progression", sort.by = "logFC", 
                    number = Inf)
tt.ALD2 <- tt.ALD2[order(-abs(tt.ALD2$logFC)), ]
signif <- tt.ALD2[tt.ALD2$adj.P.Val < 0.05, ] # Subset of significant p-value
plot(density(signif[1:2000, "logFC"]), main = "Density")
hist(signif[1:2000, "logFC"], main = "Histogram", xlab = "logFC")

In order to store them we proceed with:

write.csv(tt.ALD2, file = "ALD_Progression.csv", na = "", row.names = FALSE)

We stored all the table with 13366 genes, even though some are not significant.

These are the genes relevant of the alocholic liver disease progression. There are thus 1038 genes overexpressed in alcoholic liver disease and 962 downregulated in the alcoholic liver disease, which are the main contributers to the disease progression.

We can also see which genes are differentially expressed in each phase, and which are shared:

vennDiagram(results[, c(1:3, 6)], include = "up",
            circle.col = c("red", "green", "black", "blue"), 
            main = "Genes Up-regulated in the disease")

vennDiagram(results[, c(1:3, 6)], include = "down",
            circle.col = c("red", "green", "black", "blue"), 
            main = "Genes Down-regulated in the disease")

We can see that there are 299 genes which are differencially expressed significantly with an adjusted p-value below 0.05.

But we can plot it sepparatedly for each step, for a better comparison:

vennDiagram(results[, c(1, 2)], include = "up", 
            circle.col = c("green", "black"), 
            main = "Genes Up-regulated shared")

vennDiagram(results[, c(1, 2)], include = "down", 
            circle.col = c("green", "black"), 
            main = "Genes Down-regulated")


vennDiagram(results[, c(2, 3)], include = "up", 
            circle.col = c("green", "black"), 
            main = "Genes Up-regulated shared")

vennDiagram(results[, c(2, 3)], include = "down", 
            circle.col = c("green", "black"), 
            main = "Genes Down-regulated")


vennDiagram(results[, c(3, 6)], include = "up", 
            circle.col = c("green", "black"), 
            main = "Genes Up-regulated shared")

vennDiagram(results[, c(3, 6)], include = "down", 
            circle.col = c("green", "black"), 
            main = "Genes Down-regulated")

1.9 Functional enrichment

We can group genes by function using geneSCF, which is a program written in perl. As input it requires a list of genes with each line the name of the gene:

fileConn <- file("genes_list.txt")
writeLines(rownames(signif), fileConn)
close(fileConn)

We run the program with the following command, we asume that there are 20000 genes in humans, and we group them by pathway in REACTOME:

./geneSCF -m=normal -i=genes_list.txt -t=sym -db=REACTOME -o=. -bg=20000 -org=Hs -p=yes

1.9.1 topGO

We can also group them by GO:

library("topGO")
allGenes <- tt.ALD2[, "adj.P.Val"]
names(allGenes) <- rownames(tt.ALD2)
topDiffGenes <- function(x) {
  return(x <= 0.05)
}
GOdata.bp <- new("topGOdata",
                 ontology = "BP",
                 description = "Biological process of the signature module.",
                 allGenes = allGenes,
                 annot = annFUN.org,
                 ID = "symbol",
                 mapping = "org.Hs.eg",
                 geneSel = topDiffGenes,
                 nodeSize = 5)
## 
## Building most specific GOs .....
##  ( 10017 GO terms found. )
## 
## Build GO DAG topology ..........
##  ( 13905 GO terms and 32890 relations. )
## 
## Annotating nodes ...............
##  ( 11178 genes annotated to the GO terms. )
save(GOdata.bp, file = "GO_ALD.RData")
resultFisher <- runTest(GOdata.bp, algorithm = "classic", statistic = "fisher")
## 
##           -- Classic Algorithm -- 
## 
##       the algorithm is scoring 7509 nontrivial nodes
##       parameters: 
##           test statistic: fisher
resultKS.weight <- runTest(GOdata.bp, algorithm = 'weight01', statistic = "ks")
## 
##           -- Weight01 Algorithm -- 
## 
##       the algorithm is scoring 7509 nontrivial nodes
##       parameters: 
##           test statistic: ks
##           score order: increasing
## 
##   Level 19:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  10 nodes to be scored   (17 eliminated genes)
## 
##   Level 16:  25 nodes to be scored   (48 eliminated genes)
## 
##   Level 15:  79 nodes to be scored   (105 eliminated genes)
## 
##   Level 14:  164 nodes to be scored  (296 eliminated genes)
## 
##   Level 13:  305 nodes to be scored  (826 eliminated genes)
## 
##   Level 12:  448 nodes to be scored  (1711 eliminated genes)
## 
##   Level 11:  670 nodes to be scored  (3337 eliminated genes)
## 
##   Level 10:  871 nodes to be scored  (4737 eliminated genes)
## 
##   Level 9:   1055 nodes to be scored (6652 eliminated genes)
## 
##   Level 8:   1064 nodes to be scored (7965 eliminated genes)
## 
##   Level 7:   1077 nodes to be scored (9007 eliminated genes)
## 
##   Level 6:   869 nodes to be scored  (9770 eliminated genes)
## 
##   Level 5:   530 nodes to be scored  (10311 eliminated genes)
## 
##   Level 4:   249 nodes to be scored  (10679 eliminated genes)
## 
##   Level 3:   63 nodes to be scored   (10832 eliminated genes)
## 
##   Level 2:   20 nodes to be scored   (10961 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (11019 eliminated genes)
resultKS.elim <- runTest(GOdata.bp, algorithm = "elim", statistic = "ks")
## 
##           -- Elim Algorithm -- 
## 
##       the algorithm is scoring 7509 nontrivial nodes
##       parameters: 
##           test statistic: ks
##           cutOff: 0.01
##           score order: increasing
## 
##   Level 19:  3 nodes to be scored    (0 eliminated genes)
## 
##   Level 18:  6 nodes to be scored    (0 eliminated genes)
## 
##   Level 17:  10 nodes to be scored   (0 eliminated genes)
## 
##   Level 16:  25 nodes to be scored   (0 eliminated genes)
## 
##   Level 15:  79 nodes to be scored   (0 eliminated genes)
## 
##   Level 14:  164 nodes to be scored  (94 eliminated genes)
## 
##   Level 13:  305 nodes to be scored  (135 eliminated genes)
## 
##   Level 12:  448 nodes to be scored  (555 eliminated genes)
## 
##   Level 11:  670 nodes to be scored  (619 eliminated genes)
## 
##   Level 10:  871 nodes to be scored  (1377 eliminated genes)
## 
##   Level 9:   1055 nodes to be scored (1701 eliminated genes)
## 
##   Level 8:   1064 nodes to be scored (2520 eliminated genes)
## 
##   Level 7:   1077 nodes to be scored (2910 eliminated genes)
## 
##   Level 6:   869 nodes to be scored  (3638 eliminated genes)
## 
##   Level 5:   530 nodes to be scored  (4625 eliminated genes)
## 
##   Level 4:   249 nodes to be scored  (5710 eliminated genes)
## 
##   Level 3:   63 nodes to be scored   (6235 eliminated genes)
## 
##   Level 2:   20 nodes to be scored   (6628 eliminated genes)
## 
##   Level 1:   1 nodes to be scored    (6628 eliminated genes)

allRes <- GenTable(GOdata.bp, classic = resultFisher, weight01 = resultKS.weight,
                   elim = resultKS.elim, orderBy = "weight01", topNodes = 1000)
allRes <- allRes[allRes$weight01 < 0.05, ]
write.csv(allRes, file = "GO_ALD_progression.csv", row.names = FALSE)
head(allRes)
##         GO.ID                                 Term Annotated Significant
## 14 GO:0046487         glyoxylate metabolic process        27          22
## 15 GO:0001525                         angiogenesis       317         271
## 16 GO:0006635            fatty acid beta-oxidation        62          52
## 17 GO:1902600 hydrogen ion transmembrane transport        74          63
## 18 GO:0035987      endodermal cell differentiation        32          29
## 19 GO:0046951     ketone body biosynthetic process         7           7
##    Expected Rank in weight01 classic weight01    elim
## 14    20.96               14 0.41799  0.00013 0.00013
## 15   246.10               15 0.00025  0.00019 0.00070
## 16    48.13               16 0.15108  0.00032 7.7e-05
## 17    57.45               17 0.07421  0.00034 0.00398
## 18    24.84               18 0.05130  0.00037 0.00487
## 19     5.43               19 0.16988  0.00039 0.00039
showSigOfNodes(GOdata.bp, score(resultKS.weight), firstSigNodes = 10, useInfo = 'all')
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 126 
## Number of Edges = 224 
## 
## $complete.dag
## [1] "A graph with 126 nodes."
title(main = "GO analysis using Weight01 algorithm", line = -2)

In this plot we can observe the relationship between the top 10 significant gene ontologies, using the Weight01 algorithm.

2 SessionInfo

Here are the packages and the versions used to analyse these data and build this page:

sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices datasets  utils    
## [8] methods   base     
## 
## other attached packages:
##  [1] sva_3.22.0           genefilter_1.56.0    mgcv_1.8-16         
##  [4] nlme_3.1-128         edgeR_3.16.4         limma_3.30.6        
##  [7] geneplotter_1.52.0   annotate_1.52.0      XML_3.98-1.5        
## [10] AnnotationDbi_1.36.0 IRanges_2.8.1        S4Vectors_0.12.0    
## [13] lattice_0.20-34      Biobase_2.34.0       BiocGenerics_0.20.0 
## [16] BiocStyle_2.2.1     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8        RColorBrewer_1.1-2 bitops_1.0-6      
##  [4] tools_3.3.2        digest_0.6.10      RSQLite_1.1       
##  [7] evaluate_0.10      memoise_1.0.0      Matrix_1.2-7.1    
## [10] DBI_0.5-1          yaml_2.1.14        stringr_1.1.0     
## [13] knitr_1.15.1       locfit_1.5-9.1     rprojroot_1.1     
## [16] grid_3.3.2         survival_2.40-1    rmarkdown_1.2     
## [19] magrittr_1.5       backports_1.0.4    codetools_0.2-15  
## [22] htmltools_0.3.5    splines_3.3.2      xtable_1.8-2      
## [25] stringi_1.1.2      RCurl_1.95-4.8